Coarse-to-Fine Spatial-Temporal Relationship Inference for Temporal Sentence Grounding

نویسندگان

چکیده

Temporal sentence grounding aims to ground a query into specific segment of the video. Previous methods follow common equally-spaced frame selection mechanism for appearance and motion modeling, which fails consider redundant distracting visual information. There is also no guarantee that all meaningful frames can be obtained. Moreover, this task needs detect location clues precisely from both spatial temporal dimensions, but relationship between spatial-temporal semantic information still unexplored in existing methods. Inspired by human thinking patterns, we propose Coarse-to-Fine Spatial-Temporal Relationship Inference (CFSTRI) network progressively localize fine-grained activity segments. Firstly, present coarse-grained crucial module, where query-guided local difference context modeling adjacent helps discriminate coarse boundary locations relevant semantics, soft assignment vector locally aggregated descriptors are employed enhance representation selected frames. Then, develop matching module refine boundaries, disentangles guide excavation corresponding dimensions. Furthermore, devise gated graph convolution incorporate leveraging gate operation highlight referred propagate fused on graph. Extensive experiments two benchmark datasets demonstrate our CFSTRI significantly outperforms most state-of-the-art

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

From Coarse to Fine? Spatial and Temporal Dynamics of Cortical Face Processing

Primary vision segregates information along 2 main dimensions: orientation and spatial frequency (SF). An important question is how this primary visual information is integrated to support high-level representations. It is generally assumed that the information carried by different SF is combined following a coarse-to-fine sequence. We directly addressed this assumption by investigating how the...

متن کامل

Spatial-Temporal Trend Modeling for Ozone Concentration in Tehran City

 Fitting a suitable covariance function for the correlation structure of spatial-temporal data requires de-trending the data. In this article, some potential models for spatial-temporal trend are presented. Eventually the best model will be announced for de-trending tropospheric ozone concentration data for the city of Tehran (Capital city of Iran). By using the selected trend model, some ...

متن کامل

Hemispheric specialization of human inferior temporal cortex during coarse-to-fine and fine-to-coarse analysis of natural visual scenes.

Recent models of visual recognition have suggested that perceptual analysis may start with a parallel extraction of different spatial frequencies (SF), using a preferential coarse-to-fine (low-to-high SF) sequence of processing. A rapid extraction of low spatial frequency (LSF) information may thus provide an initial and crude parsing of the visual scene, subsequently refined by slow but more d...

متن کامل

Measuring spatial - temporal of Yazd urban form using spatial metrics

Abstract Urban form can be affected by diverse factors in different times. Socio- economic, political and physical factors are among the main contributors. So, one of the most important challenges of urban planners is measuring and identifying urban development pattern in order to direct and strengthen it to sustainable pattern and right direction. The case study of the present paper is the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2021

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2021.3095229